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DETAILED ACTION 

Claim Rejections - 35 USC § 103 

The following is a quotation of 35 U.S.C. 103(a) which forms the basis for all 

obviousness rejections set forth in this Office action: 

(a) A patent may not be obtained though the invention is not identically disclosed or described as set 
forth in section 102 of this title, if the differences between the subject matter sought to be patented and 
the prior art are such that the subject matter as a whole would have been obvious at the time the 
invention was made to a person having ordinary skill in the art to which said subject matter pertains. 
Patentability shall not be negatived by the manner in which the invention was made. 

Claims 1 to 8, 1 1 to 13, 15 to 17, 46 to 53, 56 to 58, 60 to 62, 64 to 66, and 69 to 
71 are rejected under 35 U.S.C. 103(a) as being unpatentable over Young et ai in view 
ofThelen etai ('551). 

Concerning independent claims 1, 46, and 64, Young et a/, discloses a speech 
recognition system and computer program, comprising: 

"receiving an input that specifies a context in which the speech recognition 
system processes speech" - different constraint grammars may be active at different 
times; a constraint grammar may be associated with a particular application program 
155 and may be activated when the user opens the application program and 
deactivated when the user closes the application program (column 4, lines 52 to 67: 
Figure 2); thus, opening an application corresponds to "receiving an input" from a user 
for activating a constraint grammar; one constraint grammar 225 that may be used by 
the speech recognition software 160 is a large vocabulary dictation grammar (column 5, 
lines 55 to 63: Figure 2); each dictation topic has its own vocabulary file (e.g., "medical 
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or "legal") (column 6, lines 33 to 40: Figure 2); thus, a constraint grammar relating to a 
large vocabulary dictation grammar or a dictation topic vocabulary file "specifies a 
context" related to the content of what words the speech recognition software expects it 
will hear; 

"creating a context-enhanced database using information derived from said input" 
- one constraint grammar 225 that may be used by the speech recognition software 160 
is a large vocabulary dictation grammar; a large vocabulary dictation grammar identifies 
words in the active vocabulary (column 5, lines 55 to 63: Figure 2); each dictation topic 
has its own vocabulary file (e.g., "medical or "legal") (column 6, lines 33 to 40: Figure 2); 
vocabulary files for an active vocabulary or a vocabulary file for a dictation topic is a 
"context-enhanced database" based upon which application program the user has 
opened; 

"preparing a first textual output from a speech signal by performing a speech 
recognition task to convert said speech signal into said first textual output, wherein said 
context-enhanced database is accessed to improve the speech recognition rate, 
wherein said speech signal is parsed into a plurality of computer processable speech 
segments, wherein said first textual output comprises a plurality of text segments, each 
corresponding to one of the computer processable speech segments, and wherein 
selected ones of the text segments are generated by matching a computer processable 
speech segment against an entry within the context-enhanced database, said context- 
enhanced database including a plurality of entries, each entry comprising a speech 
utterance and a corresponding textual segment for the speech utterance" - recognizer 
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215 receives and processes frames ("parsed into a plurality of computer processable 
speech segments") of an utterance to identify text ("a first textual output") corresponding 
to the utterance ("said speech signal"); scores represent how well frames of an 
utterance match text hypotheses (column 4, lines 34 to 51 : Figure 2); recognizer 215 
processes frames 210 of an utterance in view of one or more constraint grammars 225 
for placing a limitation on the order or grammatical form of the words ("a plurality of text 
segments") (column 4, lines 62: Figure 2); a constraint grammar can include a language 
model for an active vocabulary or dictation topic vocabulary file (column 5, line 56 to 
column 6, line 40: Figure 2); a language model for a vocabulary file improves a speech 
recognition rate by matching entries of utterances with corresponding words; 

"enabling editing of said first textual output to generate a final voice-generated 
output" - a user may invoke an appropriate correction command when the system 
makes a recognition error (column 16, lines 26 to 65: Figures 13A to 13N); 

"making said final voice-generated output available" - best-scoring recognition 
candidates corresponding to dictated text are provided to an active application, such as 
a word processor, and are displayed through a graphical user interface (column 15, 
lines 17 to 24: Figure 2). 

Concerning independent claims 1, 46, and 64, Young et al. discloses active 
vocabularies that change based upon active applications currently executing upon the 
computer system, but omits a pre-processing step that defines content for voice- 
generated output by extracting content from electronic documents based upon at least 
one of text contained in an e-mail sent or received by the user, information in a 
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document attached to an e-mail sent or received by the user, information in a document 
viewed by the user on a display of the computer system, information in a plurality of 
linked documents accessible to the computer system, information in a spread sheet 
executing on the computer system, call center information received via a facsimile 
device connected to the computer system, call center information received via a calling 
device connected to the computer system, and information recorded by a web browser 
executing on the computer system. However, Thelen et a/. ('551 ) discloses a system 
for creating a vocabulary and/or language model for a speech recognition system from a 
set of documents based on a search criterion (Abstract), comprising: 

"the input, at least in part, being automatically derived in a pre-processing step 
that defines content for a voice-generated output that is expected to be generated by a 
user of a computer system upon which the speech recognition system executes, the 
input being based upon at least one of text contained in an e-mail sent or received by 
the user, information in a document attached to an e-mail sent or received by the user, 
information in a document viewed by the user on a display of the computer system, 
information in a plurality of linked documents accessible to the computer system, 
information in a spread sheet executing on the computer system, call center information 
received via a facsimile device connected to the computer system, call center 
information received via a calling device connected to the computer system, and 
information recorded by a web browser executing on the computer system" - a 
vocabulary and/or language model is created by selecting documents from a set of 
documents based on a search criterion; by searching for documents based on a search 
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criterion derived from a context identifier, pertinent documents are collected in an 
effective manner, increasing the quality of recognition; in one embodiment, the context 
identifier comprises one or more keywords, which acts as a search criterion, based on 
which the documents are selected; in another embodiment, the set of documents is 
formed by a document database or document file system in a distributed computer 
system; this allows for centrally storing (e.g. in a server) a larger set of documents than 
would normally be feasible to store or provide to a client computer; alternatively, a very 
large set of documents may be distributed over several servers, as over the Internet 
(column 3, line 20 to column 4, line 27; column 6, lines 1 1 to 45); thus, content for a 
vocabulary and/or language model of a speech recognition system is created 
("automatically derived") from keywords acting as a search criterion for a context 
identifier ("specifies a context"), where the content is at least derived from a distributed 
computer system or a set of documents distributed over several servers or the Internet 
("information in a plurality of linked documents accessible to the computer system"); a 
set of documents distributed over several servers is "a plurality of linked documents". 

Concerning independent claims 1, 46, and 64, Thelen et a/. ( f 551) teaches that 
creating a vocabulary and/or language model from a set of documents distributed over 
several servers of the Internet has an advantage of increasing the quality of recognition 
by ensuring that pertinent language elements are covered, and excluding many 
irrelevant language elements, leading to faster recognition, and creation of a relatively 
small vocabulary or language model. (Column 3, Lines 26 to 43) Thus, it is suggested 
that documents relevant for a specific category of user, such as a radiologist, a surgeon, 
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or a legal practitioner, can be created. (Column 3, Lines 1 1 to 20) It would have been 
obvious to one having ordinary skill in the art to create a vocabulary and/or language 
model from information in a plurality of linked documents accessible to the computer 
system for a speech recognition system as taught by Thelen et al. ('551) in the speech 
recognition and computer program of Young et al. for a purpose of increasing the quality 
of recognition by ensuring that pertinent language elements are covered, and excluding 
many irrelevant language elements, leading to faster recognition, and creation of a 
relatively small vocabulary or language model. 

Concerning claims 2, 7, 47, and 52, Young et al. discloses speech recognition for 
dictation of words of text. 

Concerning claims 3 to 5, 15, 48 to 50, 60, and 65 to 66, Young et al. discloses a 
complete dictation vocabulary consists of an active vocabulary plus a backup dictionary 
245; a system-wide backup dictionary contains all words known to the system; word 
searches of the backup vocabularies start with the user-specific backup dictionary and 
then check the system-wide backup dictionary ("before another database is searched") 
("a second database is accessed to a find a matching word ... for which no matching 
word was found"); a user may add a word to a dictation vocabulary and a user-specific 
backup vocabulary ("the context-enhanced database is created from said input and from 
entries within the second database") (column 15, line 51 to column 16, line 25). 
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Concerning claims 6 and 51, Young etal. discloses that at least (c) and (d) and 
(e) are performed concurrently as recognized text is displayed during dictation and 
editing (column 15, line 13 to column 16, line 65: Figure 2). 

Concerning claims 8 and 53, Young etal. discloses speech recognition is 
performed in conjunction with a particular application (e.g., as Microsoft Word™), and 
updating the active vocabulary to include a constraint grammar associated with the 
application and a dictation vocabulary (column 15, lines 31 to 66: Figure 2); thus, 
speech recognition is performed "in light of entries included in" a dictation vocabulary 
("said context-enhanced database"). 

Concerning claims 1 1 , 56, and 69, Thelen et a/. ('551) discloses that a context 
identifier can consist of a set of keywords, or a sequence of words, which act as a 
search criterion to search for and select a training corpus for a vocabulary and/or 
language model of a speech recognition system (column 3, lines 43 to 58); a set of 
keywords for selecting documents from a larger set of documents are equivalent to "a 
word list" for "creating the context-enhanced database from those entries of a context- 
independent database", respectively. 

Concerning claims 12 to 13 and 57 to 58, Young et al. discloses displaying text 
on a graphical user interface of a word processor (column 15, lines 17 to 24: Figure 2); 
text is temporarily stored in memory 145 of a computer 125 (column 3, lines 44 to 48: 
Figure 1). 

Concerning claims 16 to 17, 61 to 62, and 70 to 71 , Young et al. discloses that 
when a particular application is opened ("detecting an event") ("automatically detecting 
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a change"), a new constraint grammar is activated ("automatically deriving new input"), 
and the control interface updates the active vocabulary ("responsively updating said 
context-enhanced database") (column 4, lines 62 to 67: Figure 2; column 15, lines 31 to 
38). 

Claims 14 and 59 are rejected under 35 U.S.C. 103(a) as being unpatentable 
over Young et al. in view of Thelen et al. ('551) as applied to claims 1 and 46 above, 
and further in view of Mitchell et al. 

Young et al. does not expressly disclose the features of highlighting words 
having a predetermined likelihood of misinterpretation. However, Mitchell et al. teaches 
highlighting words on a display for which a score is less than a threshold score. 
(Column 10, Lines 12 to 18: Figure 8b: Steps S72 and S73) It is suggested that an 
advantage is a processing means that permits any application running on a processor 
that enables character data from speech recognition to be entered and manipulated. 
(Column 2, Lines 45 to 55) It would have been obvious to one having ordinary skill in 
the art to highlight words having a predetermined likelihood of misinterpretation as 
suggested by Mitchell et al in the speech recognition system of Young et al. for the 
purpose of permitting any application running on a processor to enable speech 
recognition. 
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Claims 18 and 63 are rejected under 35 U.S.C. 103(a) as being unpatentable 
over Young et al. in view of Thelen et al. ('551) as applied to claims 1 and 46 above, 
and further in view of Baker et al. 

Young et al. omits a meaning variants database and a synonym lexicon. 
However, it is known in speech recognition to utilize a thesaurus. Baker et a/, teaches a 
reference source 40, which includes a dictionary and thesaurus ("meanings variants 
database" and "synonym lexicon"). (Column 15, Lines 5 to 8) It is stated that problems 
with prior art recognition systems are avoided by performing semantic and linguistic 
analysis through language knowledge. (Column 4, Line 64 to Column 5, Line 8) It 
would have been obvious to one having ordinary skill in the art to utilize a thesaurus as 
taught by Baker et a/, in the speech recognition system of Young et al. for the purpose 
of avoiding prior art problems through language knowledge. 

Response to Arguments 

Applicants' arguments filed 01 May 2006 have been considered but are moot in 
view of the new grounds of rejection, necessitated by amendment. 

Conclusion 

The prior art made of record and not relied upon is considered pertinent to 
Applicants 1 disclosure. 

Smith and Di et al. disclose related art. 
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Applicants' amendment necessitated the new grounds of rejection presented in 
this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP 
§ 706.07(a). Applicants are reminded of the extension of time policy as set forth in 37 
CFR 1.136(a). 

A shortened statutory period for reply to this final action is set to expire THREE 
MONTHS from the mailing date of this action. In the event a first reply is filed within 
TWO MONTHS of the mailing date of this final action and the advisory action is not 
mailed until after the end of the THREE-MONTH shortened statutory period, then the 
shortened statutory period will expire on the date the advisory action is mailed, and any 
extension fee pursuant to 37 CFR 1 .136(a) will be calculated from the mailing date of 
the advisory action. In no event, however, will the statutory period for reply expire later 
than SIX MONTHS from the date of this final action. 

Any inquiry concerning this communication or earlier communications from the 
examiner should be directed to Martin Lerner whose telephone number is (571 ) 272- 
7608. The examiner can normally be reached on 8:30 AM to 6:00 PM Monday to 
Thursday. 

If attempts to reach the examiner by telephone are unsuccessful, the examiner's 
supervisor, David R. Hudspeth can be reached on (571) 272-7843. The fax phone 
number for the organization where this application or proceeding is assigned is 571- 
273-8300. 
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Information regarding the status of an application may be obtained from the 
Patent Application Information Retrieval (PAIR) system. Status information for 
published applications may be obtained from either Private PAIR or Public PAIR. 
Status information for unpublished applications is available through Private PAIR only. 
For more information about the PAIR system, see http://pair-direct.uspto.gov. Should 
you have questions on access to the Private PAIR system, contact the Electronic 
Business Center (EBC) at 866-217-9197 (toll-free). 
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Martin Lemer 
Examiner 
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